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IN THE SPECIRCATrON 



Paragraph f00321 is sougjit to be rewritten as follows: 



Existing rasteiizers are fbced function devices. With the advent of multi 
texturing it has become impossible to cast sufficiently flexibihty into a fixed 
function device, particularly when up to 8 textures can be combined in one 
iW (9% fragment. Microsoft have recognized this in DX8 and are pushing 
programmable shading languages as the way fiarward. Clearly the 3D chip 
. community [[have]] has no choice [[by]] but to go along with this. 



Paragraph [0051] is sought to be rewrittoi as follows: 



The more data the memory controller can read or write per request the 
more efficient it will be able to run. Needless to say you should strive to make 
use of all the data in the transfer and not some small fraction of it. Tiles are 
also visited in an order aimed at promoting optimum memoiy usage, although 
die Memoiy Controller can [[hid]] tyde the page break cost in all transfers 
larger than one (byte wide) tile. More extensive caching techniques are used 
to smooth out demand peaks and to allow some degree of {xe-fetching to 
occur. 



Amendment -Serial No. JO/086,980 Page 9 

PACE 12^7 • RCVD AT 9M/2004 2:98:27 PM [Eastern Daylight Time] * 8VR:UePTO.EFXRF-1/1 * ONI&:872S306 • CSIO:972 380 444S • DURATION (mm-SS):0g-10 



Maid 04- 04 02: lOp 



Groover & Rssociates 



1% 



972-380-4445 



p. 13 



Paragraph [00611 is sought to be rewritten as follows: 



As any context switchable state flows flirou^ into the rasterizer part it 
goes through [[is]] the Context Unit. This unit caches all context data and 
maintains a copy in the local memory. A small cache is needed so that 
frequently updating values such as mode regist^ do not cause a significant 
amount of m^oiy trafi&c. When a context switch is needed the cache is 
flushed and the new context record read torn memoiy and conv^ed into a 
message stream to update downstream units. The message tags wiH be 
allocated to allow simple decode and mapping into flie context record for both 
narrow and wide messages. Some special cases on c^tuiing tiie context as 
well as restoring it will be needed to look after tlie cases where multiple words 
are mapped to the same tag, for example as used when program loading. One 
^ of the side eflFects of Ais is to be able to remove the context logic in each unit 
Ih and the readback mechanisms (you could just read direcdy from context 
record in memory). Also the previous context mechanisms are problematic in 
the texture pipes (because the message stream doesnt run througji the pipes) 
and tfiis solution handles this transparently. This will be very fast as changing 
context will only require a small amount of state to be save (from the cache) 
and tlie restore will be at 1 message per cycle (even for wide messages). By 
allowing wide message loading of the LUTs, WCS, etc. the context restore 
could probably be reduced to 500 cycles or 3 microseconds. 



Paragraph [00891 is sought to be rewritten as follows: 



/ 



6s 



Output DMA: Tlie output DMA is mainly used to load data from the 
core into host memory. Typical uses of this [[is]] m for image upload and 
returning current vertex state. Hie output DMA is initiated via messages 
which pass through the core and arrive via tlie Host Out Unit This allows any 
number of output DMA requests to be queued. 
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Paragraph [0092] is sought to be rewritten as foUovys: 

The Cinrent Parameter Unit's main task it to allow a parameter such as a 
color or a texture to be supplied for every vatex evoi when it is not included 
in a DMA bu£fer. This allows vertices in OpenGL to inherit previously 
defined parameters witfiout being forced to supply them on every v«atex. 
Vertex arrays and vertex buffers always supply the same set of predefined 
param^rs per vertex. Always supplying 16 sets of paramet^ on eveiy 
vertex will [[reducing]] reduce perfomiance considerably so tiie Onmit 
Parameter Unit tracks how many times a parameter is forwarded on and stops 
appending any missing parameters to a vortex once it knows the V«tex 
Shading Unit has copies in all its input buffers. 

Paragraph [00961 is sought to be revmtten as follows: 

Tlie coordinate results are passed to the Vertex Machine Unit via the 
message stream and die 16 parameter results are passed directly to tlie 
Geometry Unit on a private bus. The two output ports allow for a higher 
vertex throu^iput. 



Paragraph [00981 is sought to be rewritten as follows- 

The Cull Unit caches the window coordinates for the 16 vertices and 
when a Geom*message arrives die uoit will use the cached window 
coordinates to test dip against the viewing fi-ustnim and, for triangles, do a 
back fece test. Any primitives Ming these tests (if enabled) will be discaided. 
Any primitives passing tiiese tests are passed on, however if the cUp test is 
inconclusive the primitive is further tested against the guaid band limits. A 
pass against these new limits means that it will be left to the rasterizer to clip 
the primitive wMe it is being filled—it can do this veiy efficiency and sp^ds 
very litfle time in 'out of view' regicMis. A fail against the guard band limits or 
die near, far or user clip plane will cause the primitive to be geometrically 
chpped in the Geometry Unit. 
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Paragraph [0 1 1 1 ] is sought to be rewritten as follows: 

The Parameter Set Up Unit is replicated in each texture pipe so it only 
does the set up for primitives which reach that pipe. The parameters handled 
by tiiis unit are 8 four CQnq)onent color values and 8 four component texture 
values. For small primitives the performance of the 4 Parameter Set Up Units 
will balance the single Depth Set Up Unit. The vertex store in tfiis unit is 
arranged as a circular huSer which can hold 48 paramet^. This is 
considerably smaller than the 256 parameter store required to hold 16 
parameters for 16 v^ces. In most cases tfiere will only be a few parameters 
per vertex so we get the benefit of being able to hold 16 vertices, but as die 
number of parameters per vcaliex ina^ased then the total number of vertices 
which can be held will reduce. In the hmit we can still hold all 16 parameters 
for three vertices which is the minimum number of vertices necessary to set iq) 
the plane equations. Color parameters can be mariced as being 'flat' when flat 
shading is enabled. 



Paragraph fOl 13] is sought to be rewritten as follows: 



All parameter calculations are done by evaluating the plane equation 
direcdy rather than using DDAs. This allows the tiles ail primitives are 
decomposed into to be visited in any order and evaluation for fiagment 
positions within a tile to be done in parallel (when needed). The origin of the 
plane equation is relocated from (0, 0) to the upper left fragment of a tile 
which overlaps the primitive [[so]] to constrain the dynamic range of the c 
value in the plane equation. 
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Paragraph [0144] is sougjit to be rewritten as follows: 



Antialiased points are processed in a diffCTent way as it is not possible 
to use the edge function generators witihiout making them very expensive or 
converting flie point to [[an]] a polygon. The method used [[it]] is to calculate 
the distance fixm each subpbcel san[q)le point in the point's bounding box to the 
point's center and compare this to the poinf s radius. Subpixel sample points 
widi a distance greater than the radius do not contribute to a pixels coverage. 
The cost of this is kept low by only allowing small radius points hence the 
distance calculaticHL is a small multiply and by taking a cycle per subpixel 
sample per pixel within the bounding box. This will limit the performance on 
this primitive, however this is not a peiformance critical op^ation but does 
need to be supported as the software has no way to substitute alternative 
rendering commands due to polymode behavior. 



Paragraph [01541 is soudit to be rewritten as follows: 



Each texture pipe works autonomously and will compute tlie filtered 
texture values for tlie valid fiagments in the tile it has been given. * It will do 
this for up to ei^t sets of textures and pass the results to the Shader Unit in the 
pipe, and potentially back to the Texture Coordinate Unit for bump mapping. 
Processing within the texture pipe is done as a mixture of SIMD units (Texture 
Coordinate Unit and Shading Unit) or one fi-agment at a time (Primary Texture 
Cache Unit and Texture Filter Unit) depending on how hard iUs to paralleUze 
the required operations. 



Amencbnent - Serial No, JO/086,980 Page 23 

PAGE 1 6^7 « RCVD AT 5/4/2004 2:58:27 PM (Eastern Daylight Time] * SVR:USPT0-EFXRF-1/1 * DNI8:8729308 • CS1D:972 380 4445 * DURATION (mm^):00-1 0 



Ma^ 04-04 02: lip Groover & associates 972^380-4445 p. 17 



972^31 



Paragraph [01571 is sought to be rewritten as follows: 




The Texture Address Unit calculates the address in memory where the 
texel data resides. This operation is shared by all texture pipes (to [[saves]] 
Sasag gates by not duphcating it), and in any case it only needs to calculate 
addresses as fest as the memory/secondary cache can service fliem. The 
texture to read is identified by a 3 bit texture ID, its coordinate (i, j, k), a 
map level and a cube face. This together with local registers [[allow]] allows a 
manoiy address to be calculated. This unit oaSy works in logical aM^sses 
and the translation to physical addresses and handling any page feulting is 
done in the Memory Controller. The layout of texture data in cube maps and 
mip map chains is now fiilly specified algoritfamically so just the base address 
needs to be provided. The maximum texture ra^ size is 8Kx8K and they do 
not have to be square or a power of two in size. 



Paragraph [0168] is sou^t to be rewritten as follows: 

The primary cache is divided into two banks and each bank has 16 
cache lines, each holding 16 texels in a 4x4 patck The search is fully 
associative and 8 queries per cycle (4 in each bank) can be made. The 
replacement policy is LRU, but only on the set of cache lines not referenced by 
the current fiBgment or fragments in the latency FIFO. The banks are assigned 
so even mip map levels or 3D slices are in one bank while odd ones are in the 
othCT. The search key is based on the texel's index and texture TD not address 
in memoiy (saves having to compute 8 addresses). The cache coherency is 
only intended to woric within a sub tile or maybe a tile and never between 
tiles.[[2]] 
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Paragraph [0205] is sought to be rewritten as follows: 



Additional disclosure is found in njcxiprovisional applications Ser. No. 

[[ ]] 10/Q7L89S ffled Feb. 8, 2002 (TD-164), Ser. No. [[ ]] 

10/071,896 fBed Feb. 8, 2002 (TD-165), and Ser. No. [[ ]] 10/080.284 

ffled Feb. 20, 2002 (TD-169), all commonly owned, copending with the 
present application, and hereby incorporated by reference, and in provisional 
applications 60/267,265, 60/267;266, 60/269,462, 60/269,463, 60/269,428, 
60/269,802, 60/269,935, 60/271,851, 60/271,795, 60/271,796, 60/272,125, and 
60/272,516, various of which are referenced in the nonprovisional filing s cited 
above, and all of which are hereby incoiporated by refer«ice. 
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